Deep Learning for Reward Design to Improve Monte Carlo Tree Search in ATARI Games

نویسندگان

Xiaoxiao Guo

Satinder P. Singh

Richard L. Lewis

Honglak Lee

چکیده

Monte Carlo Tree Search (MCTS) methods have proven powerful in planning for sequential decision-making problems such as Go and video games, but their performance can be poor when the planning depth and sampling trajectories are limited or when the rewards are sparse. We present an adaptation of PGRD (policy-gradient for rewarddesign) for learning a reward-bonus function to improve UCT (a MCTS algorithm). Unlike previous applications of PGRD in which the space of reward-bonus functions was limited to linear functions of hand-coded state-action-features, we use PGRD with a multi-layer convolutional neural network to automatically learn features from raw perception as well as to adapt the non-linear reward-bonus function parameters. We also adopt a variance-reducing gradient method to improve PGRD’s performance. The new method improves UCT’s performance on multiple ATARI games compared to UCT without the reward bonus. Combining PGRD and Deep Learning in this way should make adapting rewards for MCTS algorithms far more widely and practically applicable than before.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Deep Learning for Real-Time Atari Game Play Using Offline Monte-Carlo Tree Search Planning

The combination of modern Reinforcement Learning and Deep Learning approaches holds the promise of making significant progress on challenging applications requiring both rich perception and policy-selection. The Arcade Learning Environment (ALE) provides a set of Atari games that represent a useful benchmark set of such applications. A recent breakthrough in combining model-free reinforcement l...

متن کامل

Playing the Right Atari

We experimented a simple yet powerful optimization for Monte-Carlo Go tree search. It consists in dealing appropriately with strings that have two liberties. The heuristic is contained in one page of code and the Go program that uses it improves from 50 % of won games against Gnugo 3.6 to 76 % of won games.

متن کامل

Fractal AI: A fragile theory of intelligence

Fractal AI is a theory for general artificial intelligence. It allows to derive new mathematical tools that constitute the foundations for a new kind of stochastic calculus, by modelling information using cellular automaton-like structures instead of smooth functions. In the repository included we are presenting a new Agent, derived from the first principles of the theory, which is capable of s...

متن کامل

Neurohex: A Deep Q-learning Hex Agent

DeepMind’s recent spectacular success in using deep convolutional neural nets and machine learning to build superhuman level agents — e.g. for Atari games via deep Q-learning and for the game of Go via other deep Reinforcement Learning methods — raises many questions, including to what extent these methods will succeed in other domains. In this paper we consider DQL for the game of Hex: after s...

متن کامل

High-Diversity Monte-Carlo Tree Search

For combinatorial search in single-player games nested Monte-Carlo search is an apparent alternative to algorithms like UCT that are applied in two-player and general games. To trade exploration with exploitation the randomized search procedure intensifies the search with increasing recursion depth. If a concise mapping from states to actions is available, the integration of policy learning has...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2016

Deep Learning for Reward Design to Improve Monte Carlo Tree Search in ATARI Games

نویسندگان

چکیده

منابع مشابه

Deep Learning for Real-Time Atari Game Play Using Offline Monte-Carlo Tree Search Planning

Playing the Right Atari

Fractal AI: A fragile theory of intelligence

Neurohex: A Deep Q-learning Hex Agent

High-Diversity Monte-Carlo Tree Search

عنوان ژورنال:

اشتراک گذاری